Stochastic EM-based TFBS motif discovery with MITSU
نویسندگان
چکیده
MOTIVATION The Expectation-Maximization (EM) algorithm has been successfully applied to the problem of transcription factor binding site (TFBS) motif discovery and underlies the most widely used motif discovery algorithms. In the wider field of probabilistic modelling, the stochastic EM (sEM) algorithm has been used to overcome some of the limitations of the EM algorithm; however, the application of sEM to motif discovery has not been fully explored. RESULTS We present MITSU (Motif discovery by ITerative Sampling and Updating), a novel algorithm for motif discovery, which combines sEM with an improved approximation to the likelihood function, which is unconstrained with regard to the distribution of motif occurrences within the input dataset. The algorithm is evaluated quantitatively on realistic synthetic data and several collections of characterized prokaryotic TFBS motifs and shown to outperform EM and an alternative sEM-based algorithm, particularly in terms of site-level positive predictive value. AVAILABILITY AND IMPLEMENTATION Java executable available for download at http://www.sourceforge.net/p/mitsu-motif/, supported on Linux/OS X.
منابع مشابه
Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کاملOn the detection and refinement of transcription factor binding sites using ChIP-Seq data
Coupling chromatin immunoprecipitation (ChIP) with recently developed massively parallel sequencing technologies has enabled genome-wide detection of protein-DNA interactions with unprecedented sensitivity and specificity. This new technology, ChIP-Seq, presents opportunities for in-depth analysis of transcription regulation. In this study, we explore the value of using ChIP-Seq data to better ...
متن کاملFinding subtypes of transcription factor motif pairs with distinct regulatory roles
DNA sequences bound by a transcription factor (TF) are presumed to contain sequence elements that reflect its DNA binding preferences and its downstream-regulatory effects. Experimentally identified TF binding sites (TFBSs) are usually similar enough to be summarized by a 'consensus' motif, representative of the TF DNA binding specificity. Studies have shown that groups of nucleotide TFBS varia...
متن کاملHybrid Gibbs-sampling algorithm for challenging motif discovery: GibbsDST.
The difficulties of computational discovery of transcription factor binding sites (TFBS) are well represented by (l, d) planted motif challenge problems. Large d problems are difficult, particularly for profile-based motif discovery algorithms. Their local search in the profile space is apparently incompatible with subtle motifs and large mutational distances between the motif occurrences. Here...
متن کاملBioinformatics Approach for Transcription Factor Binding Properties
A random forest classifier for transcription factor binding properties. (Fig. 1). O ne of the central questions in molecular genetics regards the mechanisms of transcriptional regulation , particularly how transcription factors (TFs) regulate expression of target genes with specific TF binding sites (TFBSs). Identifying TFBSs would permit a more comprehensive and quantitative mapping of the reg...
متن کامل